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WHY  “SHOULD”  STATISTICIANS  AND 
BUSINESSMEN  MAXIMIZE  “MORAL 
EXPECTATION”? 

J.  MARSCHAK 

COWLES  COMMISSION 


1.  Introduction 

1.1.  The  word  “should”  in  the  title  of  this  paper  has  the  same  meaning  as  in 
the  following  sentences:  “In  building  a  house,  why  should  one  act  on  the  assump¬ 
tion  that  the  floor  area  of  a  room  is  the  product  and  not  the  sum  of  its  length  and 
width?”;  “If  all  A  are  B  and  all  B  are  C,  why  should  one  avoid  acting  as  if  all  C 
were  A?”  People  may  often  act  contrary  to  these  precepts  or  norms  but  then  we 
say  that  they  do  not  act  reasonably.  To  discuss  a  set  of  norms  of  reasonable  be¬ 
havior  (or  possibly  two  or  more  such  sets,  each  set  being  consistent  internally  but 
possibly  inconsistent  with  other  sets)  is  a  problem  in  logic,  not  in  psychology.  It  is 
a  normative,  not  a  descriptive,  problem. 

1.2.  The  phrase  “moral  expectation”  stems  from  the  early  students  of  proba¬ 
bility  who  applied  probabilities  in  their  study  of  leasonable  behavior  of  players 
in  games  of  chance.  Let  the  “prospect”  P,  that  is,  the  probability  distribution 
P(X)  of  a  random  “outcome”  X,  depend  upon  a  man’s  decision  (“strategy”)  S: 

(1.2:1)  P=  P(X)  =  P(X]S). 

Let  the  set  X  of  all  possible  outcomes  X  be  completely  ordered  by  a  relation  g 
(“read:  as  good  as  or  better  than”).  Define  a  scalar  function  u{X)  on  the  set  X as 
follows:  for  any  pair,  X\  and  X2,  in  X, 

(1.2:2)  «(Xi)^«(X2)  iiX^X2. 

Then  u(X)  is  called  the  utility  of  X.  It  is  a  random  variable  whose  distribution 
depends  on  the  distribution  P  and  hence  on  the  strategy  S.  Its  expected  value, 

(1.2:3)  Eu(X)  |P(X;S)  =  Mu(S),  say, 

is  called  the  moral  expectation  of  X.  Define  a  space  g  whose  elements  S  represent 
possible  strategies.  The  title  of  the  paper  asks  whether  it  is  reasonable  always  to 
choose  as  one’s  strategy  an  element  S*  of  g  whenever 

(1.2:4)  ixu(S*)  >  m«(S') 

where  S'  is  any  element  of  g  distinct  from  S*. 

1.3.  The  “precept,”  always  (that  is,  for  any  space  g)  to  maximize  moral  ex¬ 
pectation,  leads  to  inconsistent  results  unless  all  the  utility  functions  considered 
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are  linear  transforms  of  each  other  (in  which  case  utility  is  sometimes  said  to  be 
“measurable”).  This  can  be  easily  shown  for  the  case  of  discrete  probability  dis¬ 
tributions.  Suppose  X  can  only  take  values  X0,  ...  ,  Xn,  and  denote  the  corre¬ 
sponding  probabilities  by  p0,  ...  ,  pN •  Let  u  and  v  be  two  utility  functions  as 
defined  in  (1.2:2),  and  suppose  the  space  g  is  such  as  to  include  a  strategy  for 
every  distribution.  In  particular,  let  S'  result  in  probabilities  pi,  ... ,  p h;  and  S* 
in  pi,  ...  ,  p%.  Suppose  that,  following  the  “precept,”  S*  is  not  chosen  in  pref¬ 
erence  to  S';  that  is,  by  (1.2:4), 

(1.3:1)  (S*)  ^  mu  {S')  ;  h»  (S*)  ^  h*  (S') . 

Suppose,  in  addition,  that  neither  is  S'  chosen  in  preference  to  S*.  Then  (1.3:1) 
becomes 

Hu  (S*)  =  Hu  (S')  ;  H»  (s*)  =  Hv  (S')  ; 

S  <«-*:>  •«<*.>  =°  =  2  («-£>  •'<*.>  =  S 
0  0  0 

This  must  remain  true  for  prospects  such  that  pi  d  pi  for  n  ^  2  and  pi  =  pi  =  0 
for  n  >  2.  Then  the  three  equations 

£  <#;-£)  •«<*„)  =  2  <*-*£>•»<*„)  -  S  («- W  -1 

0  0  0 

form  a  homogeneous  linear  system  in  the  three  (pi  —  pi),  n  =  0,  1,  2.  Hence  the 
m(X„),  zi(X„),  1  are  linearly  dependent,  for  any  three  arbitrarily  chosen  values  of  n. 
Therefore,  there  exist  a,  /3  such  that 

v  (X„)  =  a  +  0u  (Xn)  ,  n  =  0,  .  .  .  ,  N . 

The  linear  dependence  of  the  utility  functions  follows  thus  from  the  linear  nature 
of  the  operator  E  in  (1.2:3).  The  following  illustration  may  be  useful.  Suppose  X 
consists  of  three  alternative  sums  of  money:  $  —  1,  0,  1.  Let  w  =  M(u)  and 
v  =  L(u)  =  1  -f-  2 u  be,  respectively,  a  nonlinear  and  a  linear  monotone  increasing 
transform  of  a  utility  function  u(X).  Let  S'  and  S"  be  two  strategies  resulting, 
respectively,  in  two  different  probability  distributions  of  X,  P'(X)  and  P"(X).  In 
the  following  table,  the  moral  expectations,  h(S')  and  h(S"),  are  computed  for  the 
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three  different  utility  functions,  u,  w  and  v.  Thus,  of  the  two  strategies  S'  and  S", 
S"  (resulting  in  a  smaller  variance  of  X)  is  chosen  when  the  utility  function  is  u  or 
the  linear  transform  v  of  u.  But  when  the  utility  function  is  the  nonlinear  monotone 
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transform  w  of  u,  a  different  strategy  may  be  chosen — although  the  man  maximizes 
his  moral  expectation. 

1.4.  Pascal’s  [9]  immortality  wager  was  an  early  application  of  the  precept  to 
maximize  moral  expectation.  Essentially,  Pascal  made  four  propositions,  with  the 
first  three  of  which  we  shall  not  quarrel.  First,  assume  that,  since  chances  for  and 
against  immortality  are  unknown,  they  are  equal.  Second,  assume  that,  if  there  is 
immortality,  then  good  life  is  followed  by  eternal  bliss  and  bad  life  by  eternal  dam¬ 
nation.  Regard  these  two  sequences  as  outcomes  X[  and  X'f ,  respectively,  and 
denote  the  outcomes  “good  life  followed  by  nothing”  and  “bad  life  followed  by 
nothing”  by  X'2  and  X'f ,  respectively.  Consider  “good  life”  and  “bad  life”  as  two 
strategies,  S'  and  S".  In  effect,  Pascal  computes  the  following  two  expected  values: 

Hu  (5')  =  hu  (Xj)  +  \u  (X')  , 

Hu  (S")  =  \u  (XD  +  \u  (X2")  . 

Hence, 

Hu  (S')  >  Hu  (S")  if  u  (X[)  -  u  (XD  >  u  (X'f)  -  u  (X')  . 

For  Pascal,  the  difference  between  the  advantages  of  eternal  bliss  (following  a  short 
period  of  possibly  tedious  good  life)  and  the  disadvantages  of  eternal  damnation 
(following  a  short  though  possibly  not  unpleasant  bad  life)  exceeds  the  difference 
between  the  possible  pleasures  of  sin  and  the  possible  inconveniences  of  virtue. 
These  valuations  (the  utility  function  u)  can  be  regarded  as  his  third  proposition 
and  may  be  accepted.  We  shall  be  concerned  with  his  fourth  proposition:  that, 
because  Hu(S')  exceeds  Hu(S"),  it  is  reasonable  to  choose  S'. 

1.5.  In  the  particular  case  when  the  space  X of  outcomes  consists  of  alternative 
sums  of  money  (as  in  1.3,  table),  the  moral  expectation  of  the  gain,  Eu(X),  is 
contrasted  with  its  “mathematical  expectation,”  EX,  which  also  depends  on  S.  In 
the  Petersburg  game,  there  exists  a  strategy  S',  say,  which  makes  EX  infinite,  yet 
a  reasonable  player  would  not  choose  S'.  To  explain  the  paradox,  Daniel  Bernoulli 
stated  that  a  reasonable  man  maximizes  Eu(X)  and  not  EX  and  that  the  function 
u(X)  has  certain  properties.  See  Menger  [7]. 

1.6.  In  section  2  of  this  paper,  the  precept  to  maximize  Eu(X)  will  be  related 
to  problems  facing  statisticians  and  businessmen  and,  in  fact,  to  human  decisions 
in  general.  Section  3  gives  as  a  necessary  condition  for  the  precept  a  postulate  which 
may  be  called  the  Postulate  of  Substitution  between  Indifferent  Prospects.  In  an 
earlier  paper  [4],  this  postulate  (jointly  with  certain  other  postulates)  was  shown 
to  be  a  sufficient  condition  for  the  precept  of  maximizing  Eu(X),  valid  for  a  non¬ 
empty  class  of  utility  functions,  each  element  of  which  is  a  linear  transform  of  any 
of  the  others.  This  postulate  appears,  thus,  to  be  logically  equivalent  to  the  moral 
expectation  precept  (provided  the  other  postulates  are  admitted).  It  is  also  possibly 
equivalent  to  certain  postulates  of  von  Neumann  and  Morgenstern  [8].  For  a  com¬ 
parison,  see  [4,  section  7].  Finally,  it  is  also  equivalent  to  a  postulate  which  Samuel- 
son  [11]  recently  formulated  very  succinctly  and  which  he  proposed  to  call  Special 
Independence  Assumption.  A  comparison  between  the  postulate  and  the  econo¬ 
mist’s  concept  of  “independence  between  consumption  goods”  is  contained  in  sec¬ 
tion  3  of  the  present  paper. 

1.7.  In  section  4  a  very  rough  outline  of  a  different  approach  will  be  attempted. 
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A  rule  of  Long  Run  Success  is  formulated  (“in  the  long  run,  it  pays  to  be  reason¬ 
able”)  by  considering  a  strategy  as  a  sequence  of  rules  of  action  to  be  taken  in  re¬ 
sponse  to  future  situations.  It  seems  that  the  rule  of  Long  Run  Success  is  not  equiv¬ 
alent  to  the  precept  to  maximize  moral  expectation  unless  some  further  conditions 
are  imposed  upon  the  utility  functions.  No  definitive  results  are  available  so  far. 

1.8.  Note  that  the  space  g  of  strategies  can  be  conceived  as  including  among 
its  elements,  strategies  consistent  with  the  ordinary  rules  of  logic  and  mathe¬ 
matics,  and  strategies  not  consistent  with  these  rules.  The  distribution  P  of  out¬ 
comes  X  and,  therefore,  the  quantity  Eu(X )  will  depend  on  whether  the  decision 
maker  is  a  good  or  a  bad  logician  and  arithmetician,  on  what  kind  of  geometry 
he  applies,  etc.  This  kind  of  justification  of  a  set  of  behavior  norms  including  norms 
of  thinking  and  counting  was,  I  believe,  occasionally  attempted  by  pragmatist  and 
evolutionary  writers  with  some  rule  of  long  run  success  in  mind:  “If  you  act  on 
the  assumption  that  2  times  2  equals  5,  you  (or  your  tribe  or  species)  will,  in  some 
sense,  fare  worse  in  the  long  run  than  if  you  act  on  the  assumption  that  2  times 
2  equals  4.” 

2.  Some  concrete  cases 

2.1.  In  recent  years,  the  theory  of  statistical  inference  has  taken  a  remarkably 
“economic”  turn.  In  choosing  a  rule  for  making  observations  (the  design  of  a 
sample),  money  cost  (C)  is  subtracted  from  what  may  be  called  the  gross  gain  ( G ) 
derived  by  the  statistician  or  his  “employer”  from  the  knowledge  acquired  from 
the  observations.  G  is  conceived  as  a  sum  of  money  and  is,  in  a  simple  case,  the 
larger  the  smaller  the  error  of  the  estimation  based  on  the  sample.  The  money 
sum  G  —  C  —  X  is  thus  the  outcome  of  the  statistician’s  decision  to  choose  a  cer¬ 
tain  sample  design.  X  is  called  the  net  gain  or  profit.  X  is  to  be  maximized  with 
respect  to  the  variable  under  the  statistician’s  control,  that  is,  with  respect  to  the 
design  of  the  sample. 

2.2.  In  a  particular  case  when  the  sample  designs  under  consideration  differ 
only  with  respect  to  the  size  of  the  sample  (number  of  observations)  S,  the  best 
value  5=5*  must  satisfy  the  approximate  rule, 

dG=dC 
dS~  dS' 

(provided  G  and  C  can  be  approximated  by  differentiable  functions  of  5).  This  is 
the  familiar  rule  of  the  economists:  to  equalize  the  marginal  monetary  product  and 
the  marginal  monetary  cost  of  the  “input”  5.  More  generally  one  defines  the  space 
g  of  all  possible  sample  designs  and  maximizes  the  scalar  function  X(5)  over 
this  space. 

2.3.  However,  the  profit  X  is  a  random  scalar  since  the  gross  gain  G  depends  on 
the  values  that  the  observed  random  variables  happen  to  take.  (In  addition,  the 
cost  C,  too,  may  depend  on  observed  values,  as  for  example,  when  C  depends  on  the 
location  of  individuals  that  happen  to  fall  into  a  social  survey  sample  or  when  the 
number  of  observations  depends  on  observed  values  as  in  sequential  sampling.)  One 
cannot  maximize  the  random  variable  X  but  one  can  maximize  some  quantity  de¬ 
pending  on  its  distribution,  for  example,  its  mean  EX. 
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2.4.  Both  concepts,  the  sample  design  and  the  monetary  profit,  can  be  replaced 
by  wider  ones.  As  regards  the  first:  the  statistician  can  recommend  not  only  the 
rule  of  making  observations  but  also  the  decision  to  be  taken  after  having  collected 
them.  This  decision  may  be  the  choice  of  an  estimate  or  of  a  hypothesis.  More  gen¬ 
erally,  it  may  be  any  decision  that  will  influence  the  probability  distribution  of  the 
gross  gain  G,  for  example,  the  decision  to  buy  a  certain  quantity  of  a  commodity. 
Generalizing  the  notation  of  2.2,  one  defines  £  as  the  space  of  all  possible  “strate¬ 
gies”  S,  each  strategy  being  a  certain  rule  for  making  observations  and  for  taking 
decisions  on  the  basis  of  these  observations.  The  distribution  of  the  random  profit 
X —  to  be  denoted  by  P(X ) — will  depend  on  5  and  on  the  true  distribution  F  of 
the  observables.  We  can  write 

(2.4:1)  P(X)  =  P(X]S,F),  EX  =  EX |  (S,  F)  =  fix  (S,  F) ,  say  . 


(In  the  simple  case  of  2.1,  X  depended  on  the  estimation  error,  that  is,  the  differ¬ 
ence  between  a  point  estimate  and  the  true  value  of  the  estimated  parameter  of  the 
distribution  of  observables.  This  was  obviously  a  special  case  of  the  one  now 
stated.)  The  negative  of  the  function  nx  just  defined  is  identical  with  Wald’s  [12] 
“risk  function,”  with  two  differences :  first,  Wald  always  has  G  ^  0  (regarding  —  G, 
the  “loss  suffered  by  the  statistician,”  as  a  nonpositive  quantity) — a  trivial  differ¬ 
ence;  second,  Wald  does  not  necessarily  regard  G,  C  and  X  as  monetary  quantities 
and  presumably  accepts  the  generalization  that  we  are  going  to  make  now. 

2.5.  With  the  “statistician”  taking  over  entrepreneurial  decisions,  it  becomes 
necessary  to  reconsider  what  is  of  concern  to  the  businessman.  To  begin  with,  “A 
full  purse  is  not  as  good  as  an  empty  one  is  bad.”  There  exists  a  certain  quantity  K 
(possibly  zero)  which  depends  on  the  firm’s  reserves  and  is  such  that,  if  X  ^  K, 
the  firm  is  bankrupt  and  must  be  dissolved.  It  is  reasonable  that  the  probability  of 
the  occurrence  of  this  situation  should  be  made  as  small  as  possible.  This  objective 
may  not  be  reached  if  the  strategy  chosen  maximizes  EX.  On  the  other  hand,  sup¬ 
pose  the  firm  tries  to  maximize  the  expression  Eu{X),  where  the  “utility  function” 
u(X)  is  defined  as  follows: 

(2.5:1)  u  (X)  =  —v  when  X  ^  K  , 

u  (X)  =  X  when  X  >  K  , 

where  v  is  a  positive  constant.  Let  the  probability  density  function  of  X,  X  ^  K, 
for  a  given  strategy  S  be  p(X)  S).  Then  the  expected  value  of  u(X)  given  S  is 

(2.5:2)  Eu  (X;  S)  =  -a(r+0)  +  0, 


where 


a 

0 


p  (X;  S)  dX  =  probability  of  bankruptcy , 
Xp  (X;  S)  dX 


rp(x-,s)dx 

J  K 

profit  averaged  over  all  cases  other  than  bankruptcy. 


Both  a  and  0  depend  on  S,  and  0  is  usually  nonnegative.  It  follows  from  (2.5:2) 
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that,  for  a  given  |8,  the  firm’s  moral  expectation  is  the  larger  the  smaller  is  a,  the 
probability  of  bankruptcy.  The  maximization  of  Eu(X )  would  thus  seem  to  de¬ 
scribe  reasonable  behavior  better  than  the  maximization  of  EX. 

2.6.  Another  example  of  the  utility  function  of  a  random  sum  of  money  and  of 
the  effect  of  properties  of  this  function  upon  the  choice  of  strategy  was  given  in 
the  table  in  1.3.  A  much  discussed  case  has  been  that  of  a  function  u{X)  that  is 
differentiable  at  least  twice.  If  u'(X)  >  0  and  u"(X)  <  0  for  all  values  of  X  (the 
case  of  “decreasing  and  positive  marginal  utility  of  income”),  then  the  individual 
who  maximizes  Eu(X )  will  prefer,  given  the  value  of  the  mean  profit  EX,  a  pros¬ 
pect  with  a  low  variance  to  a  prospect  with  a  high  variance  of  X.  This  is  seen  by 
expanding  Eu(X)  into  a  Taylor  series.  See  [6]  and,  for  a  somewhat  more  general 
case,  [2]. 

2.7.  More  generally,  the  businessman  is  concerned  not  with  a  scalar  quantity 
but  with  a  vector:  with  the  sequence  of  profits  to  be  earned  in  successive  years,  the 
more  distant  ones  being  possibly  of  less  import  to  the  firm’s  decision  (even  if  the 
joint  probability  distribution  of  the  sequence  is  known  exactly)  than  the  more  im¬ 
mediate  ones;  or,  if  he  is  a  farmer,  with  a  set  of  quantities  earned,  respectively,  in 
the  form  of  food,  housing  accommodation,  etc.  The  fact  that  nonmonetary  earn¬ 
ings  and  future  earnings  of  any  kind  can  be  converted  into  current  money  at  prices 
and  interest  rates  prevailing  in  (perfect)  markets  does  not  dispose  of  the  complica¬ 
tion  since  these  prices  and  interest  rates  themselves  have  to  be  explained  by  the 
strategies  of  the  people  that  transact  in  these  markets.  However,  the  generalization 
from  a  scalar  to  a  vector  X  =  {*4  does  not  present  difficulties.  In  particular,  if 
u(X)  is  twice  differentiable,  the  results  mentioned  in  2.6  are  easily  generalized. 
Write  vector 

M  =  {Exi} 


and  matrix 
Then,  if 


a  =  Ik-ill  =  ll-E  (*,■  —  m)  (Xj  —  Mi)  II  • 


_  d2u 

Ulj  dXidxj 


exists  for  all  i,  j,  we  have  by  expanding  about  n, 


Eu(X)  =  u(n)  +^'^2uij<xij+  .  .  .  . 


It  follows  that  if  ua  <  0  (decreasing  marginal  utility  of  the  i- th  kind  of  commodity 
earned  or  of  the  i-th  year’s  money  profit),  then  the  individual  will  try  to  make  the 
variance  an  as  small  as  possible,  given  the  other  elements  of  m  and  a.  A  high  corre¬ 
lation  between  the  i-th  and  the  j- th  elements  of  the  profit  vector  will  be  feared 
if  ua  <  0  (the  case  of  goods  with  negative  complementarity) ;  a  high  correlation 
between  i  and  j  will  be  desired  if  uli  >  0  (the  case  of  positively  complementary 
goods). 

2.8.  Note  that  the  definition  of  complementarity  just  used  is  only  possible  be¬ 
cause,  as  a  corollary  of  the  requirement  to  maximize  moral  expectation  Eu(X),  all 
utility  functions  form  a  group  of  linear  transforms.  If  a  nonlinear  transform  of 
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u(X) — say  the  function  w(X)  =  f[u(X)],f  >  0 — were  admitted  as  a  utility  func¬ 
tion,  then,  since  a  positive  might  be  consistent  with  a  negative  Wij,  the  sign  of 
complementarity  as  just  defined  could  not  be  ascertained. 

2.9.  In  [5],  the  desirability  of  large  or  small  variances  of,  or  correlations  be¬ 
tween,  inputs  (production  factors)  of  various  kinds  was  studied  on  lines  similar 
to  2.7.  Rational  advantages  of  “pooling  the  risks”  and  of  either  specialization  or 
diversification  of  productioh  (depending  on  the  sign  of  complementarity  between 
factors  of  production)  can  be  derived — always  assuming  that  it  is  rational  to 
maximize  Eu{X). 

2.10.  Certain  things  desired  by  the  businessman — such  as  power  or  social  posi¬ 
tion  or  reputation — are  not  quantities  at  all.  It  is  therefore  necessary  to  generalize 
further  the  concept  of  what  is  being  maximized.  This  generalization  was  presented 
in  1.2.  If  the  space  X of  “outcomes”  and  the  space  g  of  strategies  are  defined,  this 
permits  us  to  take  care  of  all  human  decisions,  transcending  conventional  eco¬ 
nomics  and  including  the  private  man’s  choice  of  profession  or  wife,  the  legislator’s 
choice  of  election  tactics  or  national  policies  or  military  and  administrative  de¬ 
cisions.  This  is  of  some  interest  to  statisticians  who,  after  all,  are  not  all  employed 
by  profit  making  organizations. 

2.11.  Our  question  is,  then,  whether  the  following  rule  is  reasonable:  always 
choose  a  strategy  S*  so  as  to  obtain  a  prospect  (a  probability  distribution) 

P(X)  =  P  (X;  S*,  F)  , 

for  which 

Mu  (S*  F)  =  Eu  (X)  j  (, S *  F) 


is  larger  than  or  equal  to 

Mu  (^',  F)  m  Eu  (X)  j  (5',  F)  , 

for  any  S' ;  where  the  scalar  function  u(X)  is  defined  over  the  space  X  °f  outcomes, 
S*  and  S'  are  elements  of  the  space  g  of  strategies  and  F  is  the  true  distribution 
of  the  observables.  In  general,  F  is  not  (or  not  completely)  known — in  the  case  of 
the  statistician  as  well  as  in  the  case  of  any  other  decision  makers.  The  economist 
F.  H.  Knight  [3]  called  the  case  of  unknown  F  “presence  of  uncertainty”  and  the 
case  of  known  F  (as  in  games  of  chance)  “presence  of  risk.”  This  terminology  has 
been  widely  accepted  among  economists  although  it  is  not  in  line  with  general 
scientific  usage  of  the  word  uncertainty.  It  would  be  better  to  speak  of  “incom¬ 
plete”  versus  “complete”  information.  For  the  purpose  of  this  paper,  it  is  sufficient 
to  deal  with  the  case  of  complete  information,  that  is,  to  assume  F  known.  It  is  of 
no  relevance  to  us  here  whether,  in  the  case  of  F  unknown,  probabilities  of  the 
several  alternatives  have  to  be  assumed  equal  before  maximizing  uu(S,  F)  with  re¬ 
spect  to  S  as  was  done  by  Pascal  (1.4  above),  or  whether,  following  Wald  and  the 
authors  of  the  Theory  of  Games,  one  should  assume  the  least  favorable  distribution 
F,  that  is,  minimize  uu(S,  F )  with  respect  to  F  before  maximizing  it  with  respect 
to  5.  It  is  the  latter,  the  maximization  of  Mu  with  respect  to  S,  that  concerns  us 
here:  a  problem  common  to  the  case  of  complete  and  to  that  of  incomplete  infor¬ 
mation  and  arising  with  Pascal  as  with  Wald. 
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3.  The  rule  of  substitution  between  indifferent  prospects 

3.1.  In  [4],  a  certain  set  of  behavior  postulates,  numbered  I-IV,  was  shown  by 
the  author  to  imply  the  proposition  that  there  exists  a  class  of  utility  functions, 
linear  transforms  of  each  other,  and  such  that  for  every  utility  function  the  ex¬ 
pected  value  of  utility  is  maximized.  We  shall  presently  restate  these  postulates 
and  show  that,  if  postulates  I— III  (which  appear  rather  mild)  are  accepted,  then 
postulate  IV  follows  from  the  condition  that  Eu(X)  is  maximized.  A  joint  result 
of  the  two  papers  is,  then,  that  under  certain  weak  conditions,  postulate  IV  and 
the  rule  of  maximizing  moral  expectation  are  equivalent.  Postulate  IV  will  be  called 
the  “rule  of  substitution  between  indifferent  prospects.” 

3.2.  As  in  [4],  we  define  a  space  P  of  prospects  P  (=  probability  distributions) 
for  the  case  that  the  space  X  of  outcomes  consists  of  a  finite  number  of  elements, 
Xo,  Xi,  .  .  .  ,  XN •  (The  case  of  N  infinite  was  discussed  by  Rubin  in  [10]  for  the 
problem  of  [4].)  Regard  the  probability  that  the  particular  outcome  Xn  will  occur, 
p(Xn ),  as  a  coordinate  of  the  point  P  in  the  Euclidean  A7-space,  n  =  1,  .  .  .  ,  A7. 
Since 

(3.2:1)  Os|)fU.)=l-MWSl, 

n=l 

the  space  P  of  prospects  is  the  domain  of  the  yV-space  bounded  by  and  including 
the  surface  of  an  ( N  +  l)-hedron  whose  vertices  are  the  origin  (0,  0,  ...  ,  0)  and 
the  ends  (1,  0,  0,  ...  ,  0),  (0,  1,  0,  .  .  .  ,  0),  .  .  .  ,  (0,  0,  0,  .  .  .  ,  1)  of  the  N  unit 
vectors.  These  vertices  represent  the  “sure”  prospects,  each  promising  with  cer¬ 
tainty  one  particular  outcome.  Sure  prospects  will  be  denoted  by  P(n),  P(1),  .  .  .  , 
P{N).  Using  letters  P,  Q,  R,  .  .  .  for  prospects  in  general,  we  state  the  following 
postulates: 

I.  The  space  P  is  completely  ordered  by  the  relation  Q  (read  “as  good  as  or  better 
than”).  Note:  Whenever  PqQ  we  shall  also  write  u(P)  ^  u(Q).  This  defines  a 
scalar  “utility  function”  u(P)  on  the  space  P .  It  is  related  in  a  simple  way  to  the 
function  u(X )  on  the  space  X,  defined  in  1.2  above,  that  is,  we  have  u(Xn)  = 
tt(P(n)),  n  =  0,  1,  .  .  .  ,  N,  by  definition.  Furthermore,  whenever  PqQ  and 
not  QqP,  we  write  PpQ  (read  “P  preferred  to  Q ”)  and  have  «(P)  >  u(Q).  When¬ 
ever  PqQ  and  Q%P,  we  write  PiQ  (read  “P  and  Q  are  indifferent”)  and  have 
w(P)  =  u(Q). 

II.  Q  is  continuous,  that  is,  if  PqQqR,  then  there  exists  a  number  r,  0  ^  r  1, 
such  that  if  Q'  =  rP  +  (1  —  r)R  then  QiQ'.  Note:  We  are  using  the  symbols 
for  addition,  multiplication  and  equality  in  their  ordinary  meaning.  Qr  is  the  re¬ 
sult  of  ordinary  multiplication  and  addition  performed  on  two  vectors,  P  and  R. 
Geometrically,  Q'  is  a  point  on  a  straight  line  connecting  P  and  R. 

III.  There  exist  P  and  Q  not  on  the  boundary  of  P,  such  that  not  PiQ.  Note:  This 
is  postulate  III*  of  [4].  We  refer  to  [4]  for  an  alternative  postulate  and  for  a 
stronger  one. 

IV.  If  PiQ  and  0  <  r  <  1  and  if  P’  =  rR  +  (1  —  r)P  and  Q'  =  rR- {- 
(1  —  r)Q,  then  P'iQ'.  Note:  P'  is  the  prospect  of  having  either  prospect  P  or  pros¬ 
pect  R  with  certain  probabilities,  1  —  r  and  r.  Qf  is  the  prospect  of  having  either  Q 
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or  R  also  with  probabilities  1  —  r  and  r.  Postulate  IV  states  that  if  P  and  Q  are 
indifferent,  so  are  P'  and  Qf .  In  a  special  case  P  and  P'  may  coincide  [4,  postulate 
IVa].  Also,  P  or  Q  or  R  may  be  sure  prospects. 

3.3.  In  the  language  of  the  last  section — see  note  to  postulate  I — the  rule  of 
maximizing  Eu{X )  can  be  stated  thus  [writing  for  brevity  u(P(n))  —  u(n),  which 
is  independent  of  the  prospects  considered,  and  P(Xn)  =  pn,  Q(Xn)  =  qn,  etc.]. 

N  AT 

Proposition  1.  QqR  if  and  only  if  ^  u^qn  ^  ^  w(n>  rn  . 

o  o 

We  shall  prove  that  this  rule  implies 

Proposition  2.  If  PpQpP  then  the  indifference  sets  J(P),  J(Q ),  J(R)  are  seg¬ 
ments  of  three  parallel  {N  —  \) -dimensional  hyper planes ,  contained  in  P  and  stacked 
in  the  order  P,  Q,  R,  or  its  reverse , 

where  for  any  prospect  P  the  indifference  set  J(P)  is  defined  as  consisting  of  all 
prospects  P'  for  which  P'tP. 

In  fact,  if  P$Q$R  then,  by  the  definition  of  the  relations  p  and  i,  proposition  1 
implies  that 

N  N  N 

(3.3:1)  ^  u^pn  >  ^  u^qn  >  ^  w<")r«  • 

0  0  0 

If,  in  addition,  PLP',  QiQ'  and  RiR',  proposition  1  requires  that 

AT  N 

2  M(n)  ^  =  ^  ’ 

.  0  0 


2  u(")(ln  =  ^  =  b*  say  , 


W(n)  rn  =  2  u(n)  K  =  c* >  say ; 

0  0 

where  a*  >  b*  >  c*.  Write  v(n^  =  u(n 5  —  u(0\  n  =  1,  2,  .  .  .  ,  N.  Then  the  above 

N 

equations  become  upon  replacing  p0  by  1  —  ^  pn,  etc.  and  upon  replacing 

i 

a*  —  «(0)  by  a,  etc. 

(3.3:2) 


2  =  a  ’ 
1  1 

2  V{n)9n  =  2  V(n)<ln=  b  > 
1  1 

N  N 

d  (n)  rn  =  'y  ^  v  (")  P  =  c  . 


The  last  two  terms  in  the  first  line  of  (3.3:2)  give  the  equation  of  a  (A  —  ^-dimen¬ 
sional  hyperplane  which  contains  all  P'  such  that  P'iP.  This  hyperplane  is  there¬ 
fore  identical  with  the  indifference  set  7(P).  Similarly,  the  other  two  lines  in 
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(3.3:2)  provide  the  equations  for  the  hyperplanes  J(Q )  and  J(R).  The  three  hyper¬ 
planes  are  parallel,  each  having  the  N  direction  cosines  proportional  to  z>(n), 
n  =  1,  2,  N.  The  distances  of  the  three  hyperplanes  from  the  origin  are 
proportional  to  a,  b  and  c,  respectively.  Hence  they  are  stacked  in  the  order  P,  Q, 
R  or  its  reverse. 

3.4.  Proposition  2  implies,  in  turn,  postulate  IV,  provided  postulates  I— III  are 
granted.  In  fact,  in  the  notation  used  in  the  statement  of  postulate  IV,  prospects 
P  and  Q  must  both  lie  on  J(P),  one  of  the  parallel  indifference  planes  revealed  in 
proposition  2.  Moreover,  the  indifference  planes  J(P')  and  J(Q')  must  be  parallel 
to  J(P).  On  the  other  hand,  P'  and  Q’  must  both  lie  on  a  plane  parallel  to  J(P) 
because  r  =  PPf/PR  =  QQ'/QR-  And  since  in  an  Euclidean  space  there  is  only 
one  plane  through  P'  parallel  to  J(P),  we  have  P'iQ'. 

The  reasoning  of  this  and  the  preceding  sections  presupposes  the  validity  of 
postulates  I— III.  Postulate  I  excludes  the  case  in  which  neither  PqQ  nor  QqP  and 
this  would  rule  out  the  existence  of  a  function  u(P)  on  P .  Postulate  II  excludes 
the  possibility  of  “holes”  in  the  indifference  planes.  Postulate  III  excludes  the  pos¬ 
sibility  that  the  whole  interior  of  P  might  form  a  single  indifference  set. 

We  conclude  that,  given  postulates  I— III,  postulate  IV  is  not  only  a  sufficient 
condition  for  proposition  1  (as  was  shown  in  [4]),  but  also  a  necessary  condition. 
Thus,  the  two  are  equivalent,  provided  postulates  I-III  are  granted.  Postulates 
I— III  seem  indeed  weak  enough  as  a  description  of  reasonable  behavior.  Postulate 
IV  also  seems  reasonable  to  me  and  some  further  remarks  in  3.5  may  convince 
others.  If,  in  addition,  this  postulate  is  taken  to  be  intuitively  more  convincing 
than  its  logical  equivalent,  proposition  1,  then  the  question  asked  in  the  title  is 
answered. 

3.5.  In  the  discussion  of  the  theory  of  choice  between  prospects,  confusion  seems 
to  arise  through  the  use  of  an  ambiguous  word,  “combination.”  This  word  natural¬ 
ly  expresses  the  operation  11  and,”  as  in  “A  and  C .”  But  it  has  also  been  applied 
to  express  the  relation  “either — or,”  as  in  “ either  A  or  C,  with  probabilities  r  and 
1  —  r.”  Since  prospects  are  always  mutually  exclusive,  they  cannot  be  “com¬ 
bined”  into  an  object  of  choice  such  as  “prospect  A  and  prospect  C”  to  be  chosen 
in  preference  to  another  “combination”  such  as  “prospect  B  and  prospect  C.” 
But  they  can  be  “combined”  in  a  different  sense,  namely,  the  combination  “either 
A  or  C,  with  probabilities  r  and  1  —  r”  can  be  formed  and  may  be  chosen  in  pref¬ 
erence  to  the  combination  “either  B  or  C,  with  probabilities  q  and  1  —  q.”  Suppose, 
on  the  other  hand,  that  A,  B,  C  are  objects  that  are  not  mutually  exclusive.  For 
example,  A  =  country  house,  B  =  city  house,  C  =  car.  Then  the  following  rule 
of  behavior  would  not  be  reasonable:  “If  I  like  A  as  well  as  I  like  B,  then  I  like  A 
and  C  as  well  as  I  like  B  and  C.”  Such  a  rule  would  neglect  the  possibility  that  a 
car  has  more  use  in  the  country  than  in  the  city.  But  such  a  rule  is  not  our  postu¬ 
late  IV.  The  latter  says,  rather:  “Call  P  the  prospect  of  having  the  country  house 
A  and  having,  in  addition,  certain  other  things  which  will  be  held  constant  through¬ 
out  the  comparisons  and  which  we  shall  call  D;  call  Q  the  prospect  of  having  the 
city  house  B  and  D.  Then,  if  I  like  P  as  well  as  I  like  Q,  I  like  the  prospect  of  hav¬ 
ing  either  P  or  R  as  well  as  I  like  the  prospect  of  having  either  Q  or  R,  provided 
the  odds  are  the  same  in  each  case.”  This  is  reasonable.  The  fact  that  the  car  is  of 
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better  use  when  possessed  jointly  with  a  country  house  than  when  possessed  jointly 
with  a  city  house  has  no  bearing  on  the  choice  between  the  prospects  discussed,  of 
which  none  promises  the  joint  possession  of  a  car  and  a  house.  To  exclude  more  com¬ 
pletely  from  our  mind  the  logically  illegitimate  picture  of  a  joint  occurrence  of 
mutually  exclusive  events,  postulate  IV  might  be  illustrated  most  simply  as  fol¬ 
lows:  P  =  the  career  of  a  professor,  Q  =  the  career  of  a  church  minister,  R  =  the 
career  of  a  bus  driver.  Or,  imagine  that  you  have  to  choose  between  drawing  from 
one  or  another  of  two  urns;  in  each  of  them,  20  per  cent  of  all  tickets  are  inscribed 
“car”;  the  remaining  80  per  cent  are  inscribed  “country  house”  in  the  first  urn 
and  are  inscribed  “city  house”  in  the  second  urn.  If  you  are  indifferent  between 
city  house  and  country  house,  will  you  prefer  one  urn  to  the  other,  knowing  that 
if  you  do  get  a  car  in  any  of  the  lotteries  you  will  not  get  a  house  with  it?  The  same 
applies  naturally  when  P,  Q  and  R  are  themselves  nonsure  prospects,  for  example, 
if  they  are  tickets  to  three  different  lotteries. 

3.6.  It  may  be  useful  to  refer  to  the  notation  of  2.7  where  the  outcome  X  was  a 
joint  occurrence  of  quantities  xh  x2,  .  .  .  and  was  regarded  as  a  vector  {jti}.  The 
utility  of  X  (or,  in  the  language  of  prospects,  the  utility  of  the  sure  prospect  that 
X  occurs  with  certainty)  is  denoted  by  u(x  1,  x2,  .  .  .).  Disregard,  as  being  kept  con¬ 
stant,  all  Xi  for  i  >  3.  We  say  that  commodity  3  is  more  complementary  with 
commodity  1  than  with  commodity  2,  over  some  defined  intervals,  if 

(3.6:1)  u  ( x 1  +  hi ,  x2 ,  x3)  =  u  (*i  ,x2-\-h2,  *3) 

and 

u  (si  +  h\ ,  x2 ,  £3  -f-  h3)  >  u  (#i ,  x2  h2 ,  X3  +  hf)  , 

where  the  hi  are  positive.  This  definition  of  complementarity  is  in  accord  with  the 
one  used  in  2.7  for  the  case  when  the  Xi  are  continuous  and  u  is  twice  differentiable. 
This  is  seen  by  expanding  the  function  u  into  a  Taylor  series  about  x\,  x2,  x3,  and 
inserting  into  (3.6:1).  On  the  other  hand,  in  our  example  of  the  car  and  the  two 
different  houses,  the  three  Xi  can  take  the  values  0  and  1  only.  If,  in  the  continuous 
case,  the  cross  derivative  u\3  =  0  or  if  in  the  general  case 

u  (xi,  x2 ,  x3  +  h3)  —  u  fa,  x2  ,  *3) 

=  u(xi+  hi,  x2,  x3+  h3)  —  u  (xi  +  hi ,  x2 ,  x3)  , 

we  say  that  there  is  no  complementarity  (positive  or  negative)  between  com¬ 
modities  1  and  3  or  that  the  two  are  “independent.”  As  stated  in  2.8,  these  defini¬ 
tions  presuppose  that  utility  functions  are  linear  transforms  of  each  other. 

To  apply  this  concept  to  the  choice  between  prospects,  remember  that  we  have 
then,  as  the  argument  of  the  utility  function,  not  the  vector  of  quantities  of  com¬ 
modities  but  the  probabilities  of  mutually  exclusive  events,  P\,  .  .  .  ,  pN  where,  in 
particular,  P\  may  be  the  probability  that  the  vector  of  commodity  quantities  has 
a  certain  value  and  p2  may  be  the  probability  that  this  vector  has  another  value. 
It  would  be  misleading  to  say  that  postulate  IV  asserts  “independence”  (in  the 
sense  just  stated)  between  any  objects  of  choice  themselves:  the  prospects  P,Q, . . . 
are  not  (as  the  commodity  quantities  x\,  x2,  .  .  .  are)  coordinates  of  the  space  in 
which  the  indifference  surfaces  are  drawn. 

3.7.  On  the  other  hand,  the  probabilities  Pi,  pN  are  indeed  “independent,” 
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in  the  sense  that,  if  the  precept  of  maximizing  expected  utility  is  always  followed, 
then 


d2u  (pu  p2,  .  .  .  ,  pN)  _ 
dpmdpn 


m,  n  =  1,  .  .  .  ,  N , 


because  u  is  linear  in  its  continuous  argument,  the  vector  {/>,}. 


4.  The  rule  of  long  run  success 

4.1.  We  shall  now  outline  tentatively  another  proposition  which  may  under 
certain  conditions  be  implied  by  the  rule  of  maximizing  moral  expectation  and 
which  appears  (like  postulate  IV)  intuitively  more  convincing  than  the  rule  itself. 

For  an  integer  T  >  0,  define  a  space  X(T)  ,  whose  element  X(T)  represents  a  pos¬ 
sible  time  sequence  of  situations  x0,  Xi,  .  .  .  ,  xT  (for  example,  a  sequence  of  annual 
profits  or  of  balance  sheets).  Define  a  utility  function  uT(X(T))  such  that  if  X[T) 
and  X['T)  are  in  X{t>,  and  X[T)$X[t)  then  Ut(X[t j)  ^  ut(X[t)).  Define  a  closed 
space  whose  element  St  represents  a  possible  strategy,  defined  as  a  sequence 
of  functions,  so,  Si,  .  .  .  ,  St-i,  where  st  =  st(x o,  xi,  .  .  .  ,  xt ).  Thus  St  is  a  se¬ 
quence  of  rules  of  how  to  respond  at  given  times  to  a  given  sequence  of  past  situa¬ 
tions.  Now  define  the  probability  that  the  strategy  St  will  be  at  least  as  successful 
as  St- 

(4.1:1)  Pr{[uT(X{T) ;  S*T)  ^  uT  (X  {T);  S'T)  ]  }  =  rT  (5;,  S'T) ,  say. 


Now  let  T  increase  and  consider  sequences  such  as 

s*  =  (St,  St  ...),  g=  (Si  Si  etc. 


Suppose  that  there  exists  a  limit 

lim  ttt  (S*t,  S't )  =  tt  (£*,  S') ,  say  , 

T — 


for  any  two  sequences  g*  and  §'  and  suppose  that  this  limit 
(4.1:2)  *(£*,£')=  1. 


Then  the  rule  of  long  run  success  requires  that  the  sequence  g*  be  chosen  [or,  if 
two  or  more  values  g*  exist  that  satisfy  (4.1:2),  one  of  them  must  be  chosen]. 
This  corresponds  to  the  common  sense  definition,  “The  best  policy  is  the  one  that 
succeeds  in  the  long  run.” 

4.2.  We  should  like  to  know  conditions  under  which  the  rule  of  maximizing 
moral  expectation  implies  that  the  rule  of  long  run  success  is  satisfied.  As  a  mere 
example  that  may  start  a  discussion  among  those  better  qualified,  we  shall  impose 
a  (probably  unnecessarily  strong)  condition  upon  the  sequence  of  utility  functions, 


with  means 
and  variances 


Ui  (zi)  ,  u2(xi ,  x2)  uT  (xi ,  x2 ,  ,  xT) 

Eu\ ,  Eu2  ,  ,  Eut 

2  2  2 
<rt,  <ri,  ,  <rT  . 


We  do  not  assume  the  successive  random  variables  u\,  u2,  .  .  .  to  be  independent 
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statistically.  But  we  make  the  assumption  that  the  variance  0%  tends  to  zero  as 
T  — >  00 .  Then  uT  converges  in  probability  to  EuT-  Therefore  the  difference, 

Ut  (X(t)  ;  St)  —  Ut  (X(t)',  St)  , 
converges  in  probability  to 

Eut  (-^(r)j  St)  —  Eut  (X (t)>  St)  • 

(See,  for  example,  [1,  especially  pp.  253-255].)  This  difference  is  nonnegative  if  the 
rule  of  maximizing  moral  expectation  is  followed,  that  is,  if  for  every  T  a  strategy 
St  is  chosen  satisfying 

Eut  (X{t)  \  St)  ^  Eut  ( X(r)\  St)  , 
where  St  is  another  strategy  in  gT-  Then 

*(**,  *0-1 

1 

for  every  S'  and  the  rule  of  long  run  success  is  satisfied. 

We  have  used  here,  merely  to  illustrate  the  proposed  approach,  a  crucial  assump¬ 
tion  that  is  hardly  plausible:  that  the  variance  of  the  utility  function  of  outcomes 
over  a  horizon  T,  tends  to  zero  as  the  horizon  increases.  This  assumption  is  also 
unnecessarily  strong  as  it  proves  more  than  is  required:  for  the  limiting  probability 
T  (£*,  S')  to  be  equal  to  1,  it  is  not  necessary  that  each  of  the  two  compared 
utilities  converge  separately  to  its  respective  expected  value. 

An  alternative  assumption  might  be  that  of  statistical  independence  between 
utility  functions  over  successive  horizons,  that  is,  between  u2(x\,  x2),  .... 

But  this  is  even  less  reasonable.  It  is  possible  that  no  plausible  conditions  exist 
under  which  the  rule  of  maximizing  moral  expectation  satisfies  the  rule  of  long  run 
success  as  defined. 

5.  Summary 

The  rule  of  maximizing  the  expected  value  of  utility  was  shown  to  imply  that 
utility  functions  of  prospects  (that  is,  of  alternative  distributions  of  outcomes  of 
strategies)  are  linear  transforms  of  each  other  and  are  linear  in  the  probabilities  of 
outcomes.  The  rule  is  equivalent  to  the  postulate  that  indifferent  prospects  are 
substituted  for  each  other — provided  certain  other,  weak  postulates  are  granted. 
Finally,  an  attempt  was  made  to  relate  the  rule  of  maximizing  the  expected  value 
of  utility  to  a  rule  of  aiming  at  a  long  run  success.  This  required  redefining  out¬ 
comes,  strategies  and  utilities  as  time  sequences.  The  strategies  discussed  included 
those  of  statisticians  and  businessmen  and  can  be  conceived  to  include  human  de¬ 
cisions  in  general.  At  no  point  was  it  claimed  that  reasonable  behavior  is  actually 
practiced  by  men:  the  paper  is  a  study  in  consistent  sets  of  norms,  not  an  empirical 
study. 
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